其他
Python构建代理池,突破IP的封锁爬取海量数据(送项目源码)
http://www.ip3366.net/?stype=1&page=1
#获取当前页面10个ip数据
ips = selector.xpath('//*[@id="list"]/table/tbody/tr')
print(len(ips))
'''
10
'''
# 获取端口和IP
for ip in ips:
ip_num = ip.xpath('td[1]/text()').get() # ip
port_num = ip.xpath('td[2]/text()').get() # port
print(ip_num, port_num)
'''
49.70.151.180 3256
49.87.44.221 9999
42.177.142.239 9999
42.177.141.141 9999
42.176.134.43 9999
42.176.134.212 9999
49.71.142.114 9999
49.87.221.46 9999
49.87.221.120 9999
49.87.221.61 9999
'''
for page in range(1, 10+1):
print(f'-------正在爬取第{page}页数据-------')
url = f'http://www.ip3366.net/?stype=1&page={page}'
for ip in ip_list:
try:
response = requests.get(url='https://www.baidu.com', proxies=ip, timeout=2)
if response.status_code == 200:
use_proxy.append(ip)
except Exception as e:
print(f'当前为第{count}个代理ip:', ip, '请求超时, 检测不合格!!!')
else:
print(f'当前为第{count}个代理ip:', ip, '检测通过')
回复关键字“简明手册”,立即获取
入门必备书籍《简明python教程》电子版
回复关键字“黑马爬虫教程”,立即获取
2021最新爬虫学习教程
回复关键字“最新Python面试题”,立即获取
2021最新Python面试题一套
回复关键字“字节刷题手册”,立即获取
入门必备书籍2021字节面试刷题手册
回复关键字“pandas”,立即获取
《pandas中文官方手册》